Using Corpora to Develop Limited-Domain Speech Translation Systems
نویسندگان
چکیده
The paper describes the Spoken Language Translator (SLT) system, a prototype automatic speech translator. SLT is currently capable of translating spoken English queries in the domain of air travel planning into either Swedish or French, using a vocabulary of about 1200 words. We present an overview of the system's architecture, concentrating on how rationally constructed balanced corpora are used to allow rapid development of high-quality limited-domain translation systems.
منابع مشابه
SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
This study investigates the possibility of using statistical machine translation to create domainspecific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. Several translation scenarios (both unsupervised and semi-su...
متن کاملParallel Texts Extraction from Multimodal Comparable Corpora
Statistical machine translation (SMT) systems depend on the availability of domain-specific bilingual parallel text. However parallel corpora are a limited resource and they are often not available for some domains or language pairs. We analyze the feasibility of extracting parallel sentences from multimodal comparable corpora. This work extends the use of comparable corpora by using audio sour...
متن کاملCorpus-Centered Computation
To achieve translation technology that is adequate for speech-to-speech translation (S2S), this paper introduces a new attempt named Corpus-Centered Computation, (abbreviated to C and pronounced c-cube). As opposed to conventional approaches adopted by machine translation systems for written language, C places corpora at the center of the technology. For example, translation knowledge is extrac...
متن کاملBuilding a Parallel Corpus for Monologues with Clause Alignment
Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentat...
متن کاملConstruction of Chinese Segmented and POS-tagged Conversational Corpora and Their Evaluations on Spontaneous Speech Recognitions
The performance of a corpus-based language and speech processing system depends heavily on the quantity and quality of the training corpora. Although several famous Chinese corpora have been developed, most of them are mainly written text. Even for some existing corpora that contain spoken data, the quantity is insufficient and the domain is limited. In this paper, we describe the development o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995